Exercise 5¶

  1. Get a polygons map of the lowest administrative unit possible.
In [ ]:
# Distribución espacial de los local administration
import geopandas as gpd

dkMapaDistLink="https://github.com/Guille20241/CDE/raw/main/maps/whosonfirst-data-admin-dk-latest/whosonfirst-data-admin-dk-localadmin-polygon.shp"

mapdis=gpd.read_file(dkMapaDistLink)
mapdis.rename(columns={'name': 'Municipalidad'}, inplace=True)

mapdis.shape
Out[ ]:
(99, 56)
  1. Get a table of variables for those units. At least 3 numerical variables.

AND

  1. Preprocess both tables and get them ready for merging.
In [ ]:
import pandas as pd
pd.set_option('display.max_columns', 100)

# VARIABLE CRÍMENES
dkDataLink="https://github.com/Guille20241/CDE/raw/main/data/reportedcriminaloffencesbyregionandtime2023Q.xlsx"
datadis_crimen=pd.read_excel(dkDataLink, dtype={'Ubigeo': object})
datadis_crimen.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 110 entries, 0 to 109
Data columns (total 2 columns):
 #   Column                                         Non-Null Count  Dtype 
---  ------                                         --------------  ----- 
 0   Reported criminal offences by region and time  108 non-null    object
 1   Unnamed: 1                                     107 non-null    object
dtypes: object(2)
memory usage: 1.8+ KB
In [ ]:
#cambiamos el nombre de las columnas
datadis_crimen = datadis_crimen.rename(columns={datadis_crimen.columns[0]: 'Municipalidad', datadis_crimen.columns[1]: 'crimenes_reportados_2023Q3'})
In [ ]:
# dropeamos lo que no nos sirve y reseteamos los índices
datadis_crimen.drop([0, 1, 2], axis= 0, inplace=True)
datadis_crimen.reset_index(drop=True, inplace = True)
In [ ]:
datadis_crimen
Out[ ]:
Municipalidad crimenes_reportados_2023Q3
0 Region Hovedstaden 44093
1 Copenhagen 22195
2 Frederiksberg 1862
3 Dragør 140
4 Tårnby 1839
... ... ...
102 Vesthimmerlands 1175
103 Aalborg 3783
104 Unknown municipality 9511
105 NaN NaN
106 The provisions of the Danish Criminal Code reg... NaN

107 rows × 2 columns

In [ ]:
#VARIABLE EXPECTANCIA DE VIDA
dkDataLink="https://github.com/Guille20241/CDE/raw/main/data/Lifeexpentancyfornewbornbabiesbysex%2Cregionandtime.xlsx"
datadis_vida=pd.read_excel(dkDataLink, dtype={'Ubigeo': object})
#datadis_vida.head()
In [ ]:
#datadis_vida[~datadis_vida[datadis_vida.columns[0]].isna()] #ubicamos el tercer False
In [ ]:
#ubicamos de 3 a 101
datadis_vida = datadis_vida.drop(datadis_vida.columns[0], axis = 1)
datadis_vida = datadis_vida[3:101]
datadis_vida.reset_index(drop=True, inplace = True)

datadis_vida.drop(datadis_vida.columns[1 : len(datadis_vida.columns)-1], axis= 1, inplace=True)
datadis_vida = datadis_vida.rename(columns={datadis_vida.columns[0]: 'Municipalidad', datadis_vida.columns[1]: 'Life_excpectancy_2023'})
datadis_vida
Out[ ]:
Municipalidad Life_excpectancy_2023
0 Copenhagen 80.4
1 Frederiksberg 82.3
2 Dragør 82.5
3 Tårnby 80.9
4 Albertslund 81.4
... ... ...
93 Mariagerfjord 81.7
94 Morsø 80.7
95 Rebild 81.9
96 Thisted 80.5
97 Vesthimmerlands 80.9

98 rows × 2 columns

In [ ]:
# VARIABLE POBLACIÓN

dkDataLink="https://github.com/Guille20241/CDE/raw/main/data/municipalitiespopulation.xlsx"
datadis_pob=pd.read_excel(dkDataLink, dtype={'Ubigeo': object})
datadis_pob.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 99 entries, 0 to 98
Data columns (total 6 columns):
 #   Column                 Non-Null Count  Dtype 
---  ------                 --------------  ----- 
 0   LAU-1                  99 non-null     object
 1   Municipality           98 non-null     object
 2   Administrative Center  98 non-null     object
 3   Total Area             99 non-null     object
 4   Population             99 non-null     object
 5   Region                 98 non-null     object
dtypes: object(6)
memory usage: 4.8+ KB
In [ ]:
#solo necesido municipalidad y poblacion
datadis_pob = datadis_pob[['Municipality', 'Population']][1:]
datadis_pob.reset_index(drop=True, inplace = True)
datadis_pob.rename(columns={'Municipality': 'Municipalidad', 'Population':'Poblacion'}, inplace=True)
datadis_pob
Out[ ]:
Municipalidad Poblacion
0 Copenhagen 549050
1 Aarhus 314545
2 Aalborg 201142
3 Odense 191610
4 Esbjerg 115112
... ... ...
93 Langeland 13094
94 Ærø 6636
95 Samsø 3889
96 Fanø 3251
97 Læsø 1897

98 rows × 2 columns

  1. Do the merging, making the changes needed so that you keep the most columns.
In [ ]:
set(datadis_crimen.Municipalidad) - set(datadis_vida.Municipalidad)  - set(datadis_pob.Municipalidad)
#estas no las necesitamos porque están a otro nivel
#como todo es de la misma fuente, tenemos la suerte que no tenemos que utilizar fuzzy
Out[ ]:
{'Region Hovedstaden',
 'Region Midtjylland',
 'Region Nordjylland',
 'Region Sjælland',
 'Region Syddanmark',
 'The provisions of the Danish Criminal Code regarding sexual offences went through essential amendments taking effect from 1 July 2013. The amendments resulted in e.g. more categories of sexual offences than previously being placed under the provisions about rape (section 216). See more in the documentation of statistics, in the chapter Comparability: http://www.dst.dk/declarations//c1ac7749-1e15-4d3a-8ed0-fb2d26a9fe93 ',
 'Unknown municipality',
 nan}
In [ ]:
# HACEMOS UN MERGE PARA TENER LAS 3 VARIABLES
df1 = pd.merge(datadis_crimen, datadis_vida, on='Municipalidad', how='inner')
datadis_merged = pd.merge(df1, datadis_pob, on='Municipalidad', how='inner')
datadis_merged
Out[ ]:
Municipalidad crimenes_reportados_2023Q3 Life_excpectancy_2023 Poblacion
0 Copenhagen 22195 80.4 549050
1 Frederiksberg 1862 82.3 100215
2 Dragør 140 82.5 13692
3 Tårnby 1839 80.9 41151
4 Albertslund 603 81.4 27864
... ... ... ... ...
90 Læsø 12 .. 1897
91 Mariagerfjord 467 81.7 42429
92 Morsø 141 80.7 21474
93 Rebild 252 81.9 28911
94 Thisted 428 80.5 44908

95 rows × 4 columns

In [ ]:
# HAREMOS UN MERGE FINAL CON EL MAPDIS

mapDataDis = pd.merge(datadis_merged, mapdis, on='Municipalidad', how='inner')
mapDataDis = mapDataDis[['Municipalidad', 'geometry']]
mapDataDis.info()
#vemos que hay pocos registros, en este caso vale la pena hacer un fuzzy merging
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 66 entries, 0 to 65
Data columns (total 2 columns):
 #   Column         Non-Null Count  Dtype   
---  ------         --------------  -----   
 0   Municipalidad  66 non-null     object  
 1   geometry       66 non-null     geometry
dtypes: geometry(1), object(1)
memory usage: 1.2+ KB
In [ ]:
pip install thefuzz
Collecting thefuzz
  Downloading thefuzz-0.22.1-py3-none-any.whl (8.2 kB)
Collecting rapidfuzz<4.0.0,>=3.0.0 (from thefuzz)
  Downloading rapidfuzz-3.9.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.4 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.4/3.4 MB 10.9 MB/s eta 0:00:00
Installing collected packages: rapidfuzz, thefuzz
Successfully installed rapidfuzz-3.9.3 thefuzz-0.22.1
In [ ]:
from thefuzz import process
#cuales no coinciden
nomatch = set(datadis_merged.Municipalidad)- set(mapdis.Municipalidad)
# ver coincidencias
[(dis,process.extractOne(dis, mapdis.Municipalidad)) for dis in sorted(nomatch)]
Out[ ]:
[('Allerød', ('Allerod', 92, 81)),
 ('Brøndby', ('Brondby', 92, 96)),
 ('Brønderslev', ('Bronderslev', 95, 16)),
 ('Dragør', ('Dragor', 91, 55)),
 ('Fanø', ('Fano', 86, 58)),
 ('Furesø', ('Fureso', 91, 56)),
 ('Halsnæs', ('Halsnaes', 86, 10)),
 ('Helsingør', ('Helsingor', 94, 35)),
 ('Hillerød', ('Hillerod', 93, 67)),
 ('Hjørring', ('Hjorring', 93, 97)),
 ('Holbæk', ('Holbaek', 83, 0)),
 ('Høje-Taastrup', ('Hoje-Taastrup', 96, 85)),
 ('Hørsholm', ('Horsholm', 93, 69)),
 ('Ishøj', ('Ishoj', 89, 94)),
 ('Køge', ('Koge', 86, 41)),
 ('Lyngby-Taarbæk', ('Lyngby-Taarbaek', 93, 3)),
 ('Læsø', ('Halsnaes', 90, 10)),
 ('Morsø', ('Morso', 89, 32)),
 ('Næstved', ('Naestved', 86, 20)),
 ('Ringkøbing-Skjern', ('Ringkobing-Skjern', 97, 38)),
 ('Rødovre', ('Rodovre', 92, 86)),
 ('Samsø', ('Samso', 89, 8)),
 ('Solrød', ('Solrod', 91, 47)),
 ('Sorø', ('Soro', 86, 73)),
 ('Sønderborg', ('Sonderborg', 95, 51)),
 ('Tårnby', ('Tarnby', 91, 28)),
 ('Tønder', ('Tonder', 91, 70)),
 ('Vallensbæk', ('Vallensbaek', 90, 76)),
 ('Ærø', ('Herning', 90, 4))]
In [ ]:
changes={dis:process.extractOne(dis,mapdis.Municipalidad)[0] for dis in sorted(nomatch)}
changes
Out[ ]:
{'Allerød': 'Allerod',
 'Brøndby': 'Brondby',
 'Brønderslev': 'Bronderslev',
 'Dragør': 'Dragor',
 'Fanø': 'Fano',
 'Furesø': 'Fureso',
 'Halsnæs': 'Halsnaes',
 'Helsingør': 'Helsingor',
 'Hillerød': 'Hillerod',
 'Hjørring': 'Hjorring',
 'Holbæk': 'Holbaek',
 'Høje-Taastrup': 'Hoje-Taastrup',
 'Hørsholm': 'Horsholm',
 'Ishøj': 'Ishoj',
 'Køge': 'Koge',
 'Lyngby-Taarbæk': 'Lyngby-Taarbaek',
 'Læsø': 'Halsnaes',
 'Morsø': 'Morso',
 'Næstved': 'Naestved',
 'Ringkøbing-Skjern': 'Ringkobing-Skjern',
 'Rødovre': 'Rodovre',
 'Samsø': 'Samso',
 'Solrød': 'Solrod',
 'Sorø': 'Soro',
 'Sønderborg': 'Sonderborg',
 'Tårnby': 'Tarnby',
 'Tønder': 'Tonder',
 'Vallensbæk': 'Vallensbaek',
 'Ærø': 'Herning'}
In [ ]:
datadis_merged.replace({'Municipalidad':changes},inplace=True)
In [ ]:
mapDataDis = pd.merge(datadis_merged, mapdis[['Municipalidad', 'geometry']], on='Municipalidad', how='inner')
In [ ]:
import numpy as np


mapDataDis.crimenes_reportados_2023Q3 = mapDataDis.crimenes_reportados_2023Q3.astype(float)
mapDataDis.Life_excpectancy_2023.replace('..', np.nan, inplace=True)
mapDataDis.Life_excpectancy_2023 = mapDataDis.Life_excpectancy_2023.astype(float)
mapDataDis.Poblacion = mapDataDis.Poblacion.astype(int)
mapDataDis = gpd.GeoDataFrame(mapDataDis, geometry='geometry')
#MERGE FINAL:
mapDataDis.info()
<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 95 entries, 0 to 94
Data columns (total 5 columns):
 #   Column                      Non-Null Count  Dtype   
---  ------                      --------------  -----   
 0   Municipalidad               95 non-null     object  
 1   crimenes_reportados_2023Q3  95 non-null     float64 
 2   Life_excpectancy_2023       91 non-null     float64 
 3   Poblacion                   95 non-null     int64   
 4   geometry                    95 non-null     geometry
dtypes: float64(2), geometry(1), int64(1), object(1)
memory usage: 3.8+ KB

Exercise 6¶

Compute the neighbors of the capital of your country. Plot the results for each of the options.

In [ ]:
pip install libpysal
Collecting libpysal
  Downloading libpysal-4.11.0-py3-none-any.whl (2.8 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.8/2.8 MB 9.5 MB/s eta 0:00:00
Requirement already satisfied: beautifulsoup4>=4.10 in /usr/local/lib/python3.10/dist-packages (from libpysal) (4.12.3)
Requirement already satisfied: geopandas>=0.10.0 in /usr/local/lib/python3.10/dist-packages (from libpysal) (0.13.2)
Requirement already satisfied: numpy>=1.22 in /usr/local/lib/python3.10/dist-packages (from libpysal) (1.25.2)
Requirement already satisfied: packaging>=22 in /usr/local/lib/python3.10/dist-packages (from libpysal) (24.1)
Requirement already satisfied: pandas>=1.4 in /usr/local/lib/python3.10/dist-packages (from libpysal) (2.0.3)
Requirement already satisfied: platformdirs>=2.0.2 in /usr/local/lib/python3.10/dist-packages (from libpysal) (4.2.2)
Requirement already satisfied: requests>=2.27 in /usr/local/lib/python3.10/dist-packages (from libpysal) (2.31.0)
Requirement already satisfied: scipy>=1.8 in /usr/local/lib/python3.10/dist-packages (from libpysal) (1.11.4)
Requirement already satisfied: shapely>=2.0.1 in /usr/local/lib/python3.10/dist-packages (from libpysal) (2.0.4)
Requirement already satisfied: scikit-learn>=1.1 in /usr/local/lib/python3.10/dist-packages (from libpysal) (1.2.2)
Requirement already satisfied: soupsieve>1.2 in /usr/local/lib/python3.10/dist-packages (from beautifulsoup4>=4.10->libpysal) (2.5)
Requirement already satisfied: fiona>=1.8.19 in /usr/local/lib/python3.10/dist-packages (from geopandas>=0.10.0->libpysal) (1.9.6)
Requirement already satisfied: pyproj>=3.0.1 in /usr/local/lib/python3.10/dist-packages (from geopandas>=0.10.0->libpysal) (3.6.1)
Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.10/dist-packages (from pandas>=1.4->libpysal) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas>=1.4->libpysal) (2023.4)
Requirement already satisfied: tzdata>=2022.1 in /usr/local/lib/python3.10/dist-packages (from pandas>=1.4->libpysal) (2024.1)
Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests>=2.27->libpysal) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests>=2.27->libpysal) (3.7)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests>=2.27->libpysal) (2.0.7)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests>=2.27->libpysal) (2024.6.2)
Requirement already satisfied: joblib>=1.1.1 in /usr/local/lib/python3.10/dist-packages (from scikit-learn>=1.1->libpysal) (1.4.2)
Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from scikit-learn>=1.1->libpysal) (3.5.0)
Requirement already satisfied: attrs>=19.2.0 in /usr/local/lib/python3.10/dist-packages (from fiona>=1.8.19->geopandas>=0.10.0->libpysal) (23.2.0)
Requirement already satisfied: click~=8.0 in /usr/local/lib/python3.10/dist-packages (from fiona>=1.8.19->geopandas>=0.10.0->libpysal) (8.1.7)
Requirement already satisfied: click-plugins>=1.0 in /usr/local/lib/python3.10/dist-packages (from fiona>=1.8.19->geopandas>=0.10.0->libpysal) (1.1.1)
Requirement already satisfied: cligj>=0.5 in /usr/local/lib/python3.10/dist-packages (from fiona>=1.8.19->geopandas>=0.10.0->libpysal) (0.7.2)
Requirement already satisfied: six in /usr/local/lib/python3.10/dist-packages (from fiona>=1.8.19->geopandas>=0.10.0->libpysal) (1.16.0)
Installing collected packages: libpysal
Successfully installed libpysal-4.11.0
In [ ]:
from libpysal.weights import Queen, Rook, KNN

# rook
w_rook = Rook.from_dataframe(mapDataDis,use_index=False)
/usr/local/lib/python3.10/dist-packages/libpysal/weights/contiguity.py:61: UserWarning: The weights matrix is not fully connected: 
 There are 17 disconnected components.
 There are 12 islands with ids: 0, 1, 29, 33, 37, 50, 54, 59, 75, 86, 88, 92.
  W.__init__(self, neighbors, ids=ids, **kw)
In [ ]:
# queen
w_queen = Queen.from_dataframe(mapDataDis,use_index=False)
/usr/local/lib/python3.10/dist-packages/libpysal/weights/contiguity.py:347: UserWarning: The weights matrix is not fully connected: 
 There are 17 disconnected components.
 There are 12 islands with ids: 0, 1, 29, 33, 37, 50, 54, 59, 75, 86, 88, 92.
  W.__init__(self, neighbors, ids=ids, **kw)
In [ ]:
pip install folium
Requirement already satisfied: folium in /usr/local/lib/python3.10/dist-packages (0.14.0)
Requirement already satisfied: branca>=0.6.0 in /usr/local/lib/python3.10/dist-packages (from folium) (0.7.2)
Requirement already satisfied: jinja2>=2.9 in /usr/local/lib/python3.10/dist-packages (from folium) (3.1.4)
Requirement already satisfied: numpy in /usr/local/lib/python3.10/dist-packages (from folium) (1.25.2)
Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from folium) (2.31.0)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2>=2.9->folium) (2.1.5)
Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests->folium) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests->folium) (3.7)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests->folium) (2.0.7)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests->folium) (2024.6.2)
In [ ]:
pip install matplotlib
Requirement already satisfied: matplotlib in /usr/local/lib/python3.10/dist-packages (3.7.1)
Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib) (1.2.1)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.10/dist-packages (from matplotlib) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib) (4.53.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib) (1.4.5)
Requirement already satisfied: numpy>=1.20 in /usr/local/lib/python3.10/dist-packages (from matplotlib) (1.25.2)
Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib) (24.1)
Requirement already satisfied: pillow>=6.2.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib) (9.4.0)
Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib) (3.1.2)
Requirement already satisfied: python-dateutil>=2.7 in /usr/local/lib/python3.10/dist-packages (from matplotlib) (2.8.2)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.7->matplotlib) (1.16.0)
In [ ]:
pip install mapclassify
Collecting mapclassify
  Downloading mapclassify-2.6.1-py3-none-any.whl (38 kB)
Requirement already satisfied: networkx>=2.7 in /usr/local/lib/python3.10/dist-packages (from mapclassify) (3.3)
Requirement already satisfied: numpy>=1.23 in /usr/local/lib/python3.10/dist-packages (from mapclassify) (1.25.2)
Requirement already satisfied: pandas!=1.5.0,>=1.4 in /usr/local/lib/python3.10/dist-packages (from mapclassify) (2.0.3)
Requirement already satisfied: scikit-learn>=1.0 in /usr/local/lib/python3.10/dist-packages (from mapclassify) (1.2.2)
Requirement already satisfied: scipy>=1.8 in /usr/local/lib/python3.10/dist-packages (from mapclassify) (1.11.4)
Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.10/dist-packages (from pandas!=1.5.0,>=1.4->mapclassify) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas!=1.5.0,>=1.4->mapclassify) (2023.4)
Requirement already satisfied: tzdata>=2022.1 in /usr/local/lib/python3.10/dist-packages (from pandas!=1.5.0,>=1.4->mapclassify) (2024.1)
Requirement already satisfied: joblib>=1.1.1 in /usr/local/lib/python3.10/dist-packages (from scikit-learn>=1.0->mapclassify) (1.4.2)
Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from scikit-learn>=1.0->mapclassify) (3.5.0)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.8.2->pandas!=1.5.0,>=1.4->mapclassify) (1.16.0)
Installing collected packages: mapclassify
Successfully installed mapclassify-2.6.1
In [ ]:
mapDataDis.iloc[w_queen.islands,:].explore()
Out[ ]:
Make this Notebook Trusted to load map: File -> Trust Notebook
In [ ]:
#hay islas, usaremos k vecinos próximos para aproximarnos
w_knn8 = KNN.from_dataframe(mapDataDis, k=28)
w_knn8.islands
Out[ ]:
[]
In [ ]:
# vemos el vecino, solo funciona bien con KNN
base = mapDataDis[mapDataDis.Municipalidad=="Copenhagen"].plot()
mapDataDis.iloc[w_knn8.neighbors[0] ,].plot(ax=base,facecolor="yellow",edgecolor='k')
mapDataDis.head(1).plot(ax=base,facecolor="red")
Out[ ]:
<Axes: >
No description has been provided for this image

Exercise 7¶

  1. Compute the Moran's coefficient for one of your three numeric variables.
In [ ]:
pip install PySAl
Collecting PySAl
  Downloading pysal-24.1-py3-none-any.whl (17 kB)
Requirement already satisfied: libpysal>=4.6.2 in /usr/local/lib/python3.10/dist-packages (from PySAl) (4.11.0)
Collecting access>=1.1.8 (from PySAl)
  Downloading access-1.1.9-py3-none-any.whl (21 kB)
Collecting esda>=2.4.1 (from PySAl)
  Downloading esda-2.5.1-py3-none-any.whl (132 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 132.4/132.4 kB 2.2 MB/s eta 0:00:00
Collecting giddy>=2.3.3 (from PySAl)
  Downloading giddy-2.3.5-py3-none-any.whl (61 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 61.1/61.1 kB 6.5 MB/s eta 0:00:00
Collecting inequality>=1.0.0 (from PySAl)
  Downloading inequality-1.0.1-py3-none-any.whl (15 kB)
Collecting pointpats>=2.2.0 (from PySAl)
  Downloading pointpats-2.4.0-py3-none-any.whl (58 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 58.4/58.4 kB 7.7 MB/s eta 0:00:00
Collecting segregation>=2.3.1 (from PySAl)
  Downloading segregation-2.5-py3-none-any.whl (141 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 141.3/141.3 kB 8.4 MB/s eta 0:00:00
Collecting spaghetti>=1.6.6 (from PySAl)
  Downloading spaghetti-1.7.6-py3-none-any.whl (53 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 53.9/53.9 kB 7.0 MB/s eta 0:00:00
Collecting mgwr>=2.1.2 (from PySAl)
  Downloading mgwr-2.2.1-py3-none-any.whl (47 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 47.9/47.9 kB 6.0 MB/s eta 0:00:00
Collecting momepy>=0.5.3 (from PySAl)
  Downloading momepy-0.7.0-py3-none-any.whl (277 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 277.8/277.8 kB 18.0 MB/s eta 0:00:00
Collecting spglm>=1.0.8 (from PySAl)
  Downloading spglm-1.1.0-py3-none-any.whl (41 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 41.4/41.4 kB 4.9 MB/s eta 0:00:00
Collecting spint>=1.0.7 (from PySAl)
  Downloading spint-1.0.7.tar.gz (28 kB)
  Preparing metadata (setup.py) ... done
Collecting spreg>=1.2.4 (from PySAl)
  Downloading spreg-1.4.2-py3-none-any.whl (331 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 331.8/331.8 kB 18.2 MB/s eta 0:00:00
Collecting spvcm>=0.3.0 (from PySAl)
  Downloading spvcm-0.3.0.tar.gz (5.7 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.7/5.7 MB 31.8 MB/s eta 0:00:00
  Preparing metadata (setup.py) ... done
Collecting tobler>=0.8.2 (from PySAl)
  Downloading tobler-0.11.2-py3-none-any.whl (34 kB)
Requirement already satisfied: mapclassify>=2.4.3 in /usr/local/lib/python3.10/dist-packages (from PySAl) (2.6.1)
Collecting splot>=1.1.5.post1 (from PySAl)
  Downloading splot-1.1.5.post1-py3-none-any.whl (39 kB)
Collecting spopt>=0.4.1 (from PySAl)
  Downloading spopt-0.6.1-py3-none-any.whl (243 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 243.1/243.1 kB 10.7 MB/s eta 0:00:00
Requirement already satisfied: geopandas in /usr/local/lib/python3.10/dist-packages (from access>=1.1.8->PySAl) (0.13.2)
Requirement already satisfied: numpy>=1.3 in /usr/local/lib/python3.10/dist-packages (from access>=1.1.8->PySAl) (1.25.2)
Requirement already satisfied: pandas>=0.23.4 in /usr/local/lib/python3.10/dist-packages (from access>=1.1.8->PySAl) (2.0.3)
Requirement already satisfied: requests>=2 in /usr/local/lib/python3.10/dist-packages (from access>=1.1.8->PySAl) (2.31.0)
Requirement already satisfied: scikit-learn>=1.0 in /usr/local/lib/python3.10/dist-packages (from esda>=2.4.1->PySAl) (1.2.2)
Requirement already satisfied: scipy>=1.9 in /usr/local/lib/python3.10/dist-packages (from esda>=2.4.1->PySAl) (1.11.4)
Collecting quantecon>=0.4.7 (from giddy>=2.3.3->PySAl)
  Downloading quantecon-0.7.2-py3-none-any.whl (215 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 215.4/215.4 kB 19.1 MB/s eta 0:00:00
Requirement already satisfied: beautifulsoup4>=4.10 in /usr/local/lib/python3.10/dist-packages (from libpysal>=4.6.2->PySAl) (4.12.3)
Requirement already satisfied: packaging>=22 in /usr/local/lib/python3.10/dist-packages (from libpysal>=4.6.2->PySAl) (24.1)
Requirement already satisfied: platformdirs>=2.0.2 in /usr/local/lib/python3.10/dist-packages (from libpysal>=4.6.2->PySAl) (4.2.2)
Requirement already satisfied: shapely>=2.0.1 in /usr/local/lib/python3.10/dist-packages (from libpysal>=4.6.2->PySAl) (2.0.4)
Requirement already satisfied: networkx>=2.7 in /usr/local/lib/python3.10/dist-packages (from mapclassify>=2.4.3->PySAl) (3.3)
Requirement already satisfied: tqdm>=4.63.0 in /usr/local/lib/python3.10/dist-packages (from momepy>=0.5.3->PySAl) (4.66.4)
Requirement already satisfied: matplotlib in /usr/local/lib/python3.10/dist-packages (from pointpats>=2.2.0->PySAl) (3.7.1)
Collecting deprecation (from segregation>=2.3.1->PySAl)
  Downloading deprecation-2.1.0-py2.py3-none-any.whl (11 kB)
Requirement already satisfied: joblib in /usr/local/lib/python3.10/dist-packages (from segregation>=2.3.1->PySAl) (1.4.2)
Requirement already satisfied: seaborn in /usr/local/lib/python3.10/dist-packages (from segregation>=2.3.1->PySAl) (0.13.1)
Requirement already satisfied: numba in /usr/local/lib/python3.10/dist-packages (from segregation>=2.3.1->PySAl) (0.58.1)
Requirement already satisfied: pyproj>=3 in /usr/local/lib/python3.10/dist-packages (from segregation>=2.3.1->PySAl) (3.6.1)
Collecting rtree>=1.0 (from spaghetti>=1.6.6->PySAl)
  Downloading Rtree-1.2.0-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (535 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 535.2/535.2 kB 29.5 MB/s eta 0:00:00
Collecting pulp>=2.7 (from spopt>=0.4.1->PySAl)
  Downloading PuLP-2.8.0-py3-none-any.whl (17.7 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17.7/17.7 MB 28.7 MB/s eta 0:00:00
Collecting rasterio (from tobler>=0.8.2->PySAl)
  Downloading rasterio-1.3.10-cp310-cp310-manylinux2014_x86_64.whl (21.5 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 21.5/21.5 MB 24.6 MB/s eta 0:00:00
Requirement already satisfied: statsmodels in /usr/local/lib/python3.10/dist-packages (from tobler>=0.8.2->PySAl) (0.14.2)
Collecting rasterstats (from tobler>=0.8.2->PySAl)
  Downloading rasterstats-0.19.0-py3-none-any.whl (16 kB)
Requirement already satisfied: soupsieve>1.2 in /usr/local/lib/python3.10/dist-packages (from beautifulsoup4>=4.10->libpysal>=4.6.2->PySAl) (2.5)
Requirement already satisfied: fiona>=1.8.19 in /usr/local/lib/python3.10/dist-packages (from geopandas->access>=1.1.8->PySAl) (1.9.6)
Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.10/dist-packages (from pandas>=0.23.4->access>=1.1.8->PySAl) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas>=0.23.4->access>=1.1.8->PySAl) (2023.4)
Requirement already satisfied: tzdata>=2022.1 in /usr/local/lib/python3.10/dist-packages (from pandas>=0.23.4->access>=1.1.8->PySAl) (2024.1)
Requirement already satisfied: certifi in /usr/local/lib/python3.10/dist-packages (from pyproj>=3->segregation>=2.3.1->PySAl) (2024.6.2)
Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from quantecon>=0.4.7->giddy>=2.3.3->PySAl) (1.12.1)
Requirement already satisfied: llvmlite<0.42,>=0.41.0dev0 in /usr/local/lib/python3.10/dist-packages (from numba->segregation>=2.3.1->PySAl) (0.41.1)
Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests>=2->access>=1.1.8->PySAl) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests>=2->access>=1.1.8->PySAl) (3.7)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests>=2->access>=1.1.8->PySAl) (2.0.7)
Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from scikit-learn>=1.0->esda>=2.4.1->PySAl) (3.5.0)
Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->pointpats>=2.2.0->PySAl) (1.2.1)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.10/dist-packages (from matplotlib->pointpats>=2.2.0->PySAl) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib->pointpats>=2.2.0->PySAl) (4.53.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->pointpats>=2.2.0->PySAl) (1.4.5)
Requirement already satisfied: pillow>=6.2.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib->pointpats>=2.2.0->PySAl) (9.4.0)
Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->pointpats>=2.2.0->PySAl) (3.1.2)
Collecting affine (from rasterio->tobler>=0.8.2->PySAl)
  Downloading affine-2.4.0-py3-none-any.whl (15 kB)
Requirement already satisfied: attrs in /usr/local/lib/python3.10/dist-packages (from rasterio->tobler>=0.8.2->PySAl) (23.2.0)
Requirement already satisfied: click>=4.0 in /usr/local/lib/python3.10/dist-packages (from rasterio->tobler>=0.8.2->PySAl) (8.1.7)
Requirement already satisfied: cligj>=0.5 in /usr/local/lib/python3.10/dist-packages (from rasterio->tobler>=0.8.2->PySAl) (0.7.2)
Collecting snuggs>=1.4.1 (from rasterio->tobler>=0.8.2->PySAl)
  Downloading snuggs-1.4.7-py3-none-any.whl (5.4 kB)
Requirement already satisfied: click-plugins in /usr/local/lib/python3.10/dist-packages (from rasterio->tobler>=0.8.2->PySAl) (1.1.1)
Requirement already satisfied: setuptools in /usr/local/lib/python3.10/dist-packages (from rasterio->tobler>=0.8.2->PySAl) (67.7.2)
Collecting simplejson (from rasterstats->tobler>=0.8.2->PySAl)
  Downloading simplejson-3.19.2-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (137 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 137.9/137.9 kB 9.4 MB/s eta 0:00:00
Requirement already satisfied: patsy>=0.5.6 in /usr/local/lib/python3.10/dist-packages (from statsmodels->tobler>=0.8.2->PySAl) (0.5.6)
Requirement already satisfied: six in /usr/local/lib/python3.10/dist-packages (from fiona>=1.8.19->geopandas->access>=1.1.8->PySAl) (1.16.0)
Requirement already satisfied: mpmath<1.4.0,>=1.1.0 in /usr/local/lib/python3.10/dist-packages (from sympy->quantecon>=0.4.7->giddy>=2.3.3->PySAl) (1.3.0)
Building wheels for collected packages: spint, spvcm
  Building wheel for spint (setup.py) ... done
  Created wheel for spint: filename=spint-1.0.7-py3-none-any.whl size=31360 sha256=1c7baca993a695724f6a0375739dafc65afe7d352ece0e8a8e0d9b66e327e80a
  Stored in directory: /root/.cache/pip/wheels/f6/1d/ab/81b0c9d17a778a97ec078147cb11901afdab420c4894dcfbc5
  Building wheel for spvcm (setup.py) ... done
  Created wheel for spvcm: filename=spvcm-0.3.0-py3-none-any.whl size=5777184 sha256=abdd8e567075a42580576ec54cc22a65d3d9e18792eb69641235806dcd02f090
  Stored in directory: /root/.cache/pip/wheels/1c/58/6f/debcb62c0a142a6615a65f23217209b543b478d309edfa4e2b
Successfully built spint spvcm
Installing collected packages: snuggs, simplejson, rtree, pulp, deprecation, affine, rasterio, quantecon, rasterstats, access, tobler, spreg, segregation, pointpats, momepy, inequality, esda, spvcm, spglm, spaghetti, giddy, spopt, splot, spint, mgwr, PySAl
Successfully installed PySAl-24.1 access-1.1.9 affine-2.4.0 deprecation-2.1.0 esda-2.5.1 giddy-2.3.5 inequality-1.0.1 mgwr-2.2.1 momepy-0.7.0 pointpats-2.4.0 pulp-2.8.0 quantecon-0.7.2 rasterio-1.3.10 rasterstats-0.19.0 rtree-1.2.0 segregation-2.5 simplejson-3.19.2 snuggs-1.4.7 spaghetti-1.7.6 spglm-1.1.0 spint-1.0.7 splot-1.1.5.post1 spopt-0.6.1 spreg-1.4.2 spvcm-0.3.0 tobler-0.11.2
In [ ]:
from esda.moran import Moran

moranCrime = Moran(mapDataDis.crimenes_reportados_2023Q3, w_knn8)
moranCrime.I,moranCrime.p_sim
Out[ ]:
(-0.013725729742002828, 0.408)
  1. Make a scatter plot for each variable.
In [ ]:
from splot.esda import moran_scatterplot
import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(10, 6))

moran_scatterplot(moranCrime, aspect_equal=True, ax=ax)

ax.set_xlabel('Reported_Crime_std')
ax.set_ylabel('SpatialLag_Reported_Crime_std')

ax.set_xlim(-0.5, 0.5)
ax.set_ylim(-0.25, 0.25)
Out[ ]:
(-0.25, 0.25)
No description has been provided for this image
In [ ]:
fig, ax = plt.subplots(figsize=(10, 6))

moran_scatterplot(Moran(mapDataDis.Poblacion, w_knn8), aspect_equal=True, ax=ax)

ax.set_xlabel('Poblacion_std')
ax.set_ylabel('SpatialLag_Poblacion_std_std')

ax.set_xlim(-0.5, 0.5)
ax.set_ylim(-0.25, 0.25)
Out[ ]:
(-0.25, 0.25)
No description has been provided for this image
In [ ]:
# para esta variable hay nulos y tenemos que manejarlos
aux = mapDataDis.dropna(subset=['Life_excpectancy_2023'])
w_knn8_aux = KNN.from_dataframe(aux, k=28)
Moran(aux.Life_excpectancy_2023, w_knn8_aux).I, Moran(aux.Life_excpectancy_2023, w_knn8_aux).p_sim

fig, ax = plt.subplots(figsize=(10, 6))

moran_scatterplot(Moran(aux.Life_excpectancy_2023, w_knn8_aux), aspect_equal=True, ax=ax)

ax.set_xlabel('Life_excpectancy_std')
ax.set_ylabel('SpatialLag_Life_excpectancy_std')

#ax.set_xlim(-4, 4)
#ax.set_ylim(-1.5, 1.5)
Out[ ]:
Text(0, 0.5, 'SpatialLag_Life_excpectancy_std')
No description has been provided for this image

Exercise 8¶

  1. Compute the Local Moran for the variables in your data that have significant spatial correlation.
In [ ]:
# tenemos que calcular a LISA

from esda.moran import Moran_Local
lisaLE = Moran_Local(y=aux['Life_excpectancy_2023'], w=w_knn8_aux,seed=2023)
#lisaLE es lo que nos piden
In [ ]:
fig, ax = moran_scatterplot(lisaLE,p=0.05)
ax.set_xlabel('Life_excpectancy_std')
ax.set_ylabel('SpatialLag_Life_excpectancy_std');
No description has been provided for this image
  1. Create a new column for each of those variables, with a label ('0 no_sig', '1 hotSpot', '2 coldOutlier', '3 coldSpot', '4 hotOutlier').
In [ ]:
aux['Life_Expectancy_quadrant']=[l if p <0.05 else 0 for l,p in zip(lisaLE.q,lisaLE.p_sim)  ]
aux['Life_Expectancy_quadrant'].value_counts()
/usr/local/lib/python3.10/dist-packages/geopandas/geodataframe.py:1538: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  super().__setitem__(key, value)
Out[ ]:
Life_Expectancy_quadrant
0    42
1    22
2    17
3    10
Name: count, dtype: int64
In [ ]:
labels = [ '0 no_sig', '1 hotSpot', '2 coldOutlier', '3 coldSpot', '4 hotOutlier']
aux['Life_Expectancy_quadrant_names']=[labels[i] for i in aux['Life_Expectancy_quadrant']]
aux.head()
/usr/local/lib/python3.10/dist-packages/geopandas/geodataframe.py:1538: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  super().__setitem__(key, value)
Out[ ]:
Municipalidad crimenes_reportados_2023Q3 Life_excpectancy_2023 Poblacion geometry Life_Expectancy_quadrant Life_Expectancy_quadrant_names
0 Copenhagen 22195.0 80.4 549050 MULTIPOLYGON (((12.73416 55.70339, 12.73417 55... 2 2 coldOutlier
1 Frederiksberg 1862.0 82.3 100215 POLYGON ((12.52731 55.69556, 12.52732 55.69555... 1 1 hotSpot
2 Dragor 140.0 82.5 13692 MULTIPOLYGON (((12.56371 55.57581, 12.56371 55... 1 1 hotSpot
3 Tarnby 1839.0 80.9 41151 MULTIPOLYGON (((12.73547 55.63006, 12.73561 55... 2 2 coldOutlier
4 Albertslund 603.0 81.4 27864 POLYGON ((12.37471 55.66018, 12.37436 55.66014... 2 2 coldOutlier
  1. Prepare a map for each of the variables analyzed, showing the spots and outliers.
In [ ]:
# custom colors
from matplotlib import colors
myColMap = colors.ListedColormap([ 'white', 'pink', 'cyan', 'azure','red'])

# Set up figure and ax
f, ax = plt.subplots(1, figsize=(12,12))
# Plot unique values choropleth including
# a legend and with no boundary lines

plt.title('Spots and Outliers')

aux.plot(column='Life_Expectancy_quadrant_names',
                categorical=True,
                cmap=myColMap,
                linewidth=0.1,
                edgecolor='k',
                legend=True,
                legend_kwds={'loc': 'center left',
                             'bbox_to_anchor': (0.7, 0.6)},
                ax=ax)
# Remove axis
ax.set_axis_off()
# Display the map
plt.show()
No description has been provided for this image

Exercise 9¶

Use your three variables to carry out the cluster/regional analysis.

In [ ]:
selected_variables = ['Life_excpectancy_2023',
                     'crimenes_reportados_2023Q3',
                     'Poblacion']
aux[selected_variables].corr()
Out[ ]:
Life_excpectancy_2023 crimenes_reportados_2023Q3 Poblacion
Life_excpectancy_2023 1.000000 -0.087615 -0.056632
crimenes_reportados_2023Q3 -0.087615 1.000000 0.954596
Poblacion -0.056632 0.954596 1.000000
In [ ]:
# normalizamos la data
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
normalized_data = scaler.fit_transform(aux[selected_variables])

# new names
selected_variables_new_std=[s+'_std' for s in selected_variables]

# add colunms
aux[selected_variables_new_std]=normalized_data
/usr/local/lib/python3.10/dist-packages/geopandas/geodataframe.py:1538: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  super().__setitem__(key, value)
/usr/local/lib/python3.10/dist-packages/geopandas/geodataframe.py:1538: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  super().__setitem__(key, value)
/usr/local/lib/python3.10/dist-packages/geopandas/geodataframe.py:1538: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  super().__setitem__(key, value)

Clustering convencional

In [ ]:
from scipy.cluster import hierarchy as hc


Z = hc.linkage(aux[selected_variables_new_std], 'ward')
# calculate full dendrogram
plt.figure(figsize=(25, 10))
plt.title('Hierarchical Clustering Dendrogram')
plt.xlabel('cases')
plt.ylabel('distance')
hc.dendrogram(
    Z,
    leaf_rotation=90.,  # rotates the x axis labels
    leaf_font_size=1,  # font size for the x axis labels
)
plt.show()
No description has been provided for this image
In [ ]:
from sklearn.cluster import AgglomerativeClustering as agnes
import numpy as np

np.random.seed(42)# Set seed for reproducibility

# El dendograma recomienda 1 grupo (que sucede por la baja correlación), intentemos con 4
model = agnes(linkage="ward", n_clusters=4).fit(aux[selected_variables_new_std])

# Assign labels to main data table
aux["hc_ag4"] = model.labels_
/usr/local/lib/python3.10/dist-packages/geopandas/geodataframe.py:1538: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  super().__setitem__(key, value)
In [ ]:
# Set up figure and ax
f, ax = plt.subplots(1, figsize=(9, 9))
# Plot unique values choropleth including
# a legend and with no boundary lines
aux.plot(
    column="hc_ag4", categorical=True, legend=True, linewidth=0, ax=ax
)
# Remove axis
ax.set_axis_off()
# Display the map
plt.show()
No description has been provided for this image

Clustering Espacial

In [ ]:
# CLUSTERING ESPACIAL

#usamos de frente k vecinos próximos con k = 28
from sklearn.cluster import AgglomerativeClustering as agnes

model_knn28 = agnes(linkage="ward",
                    n_clusters=4,
                    connectivity=w_knn8_aux.sparse).fit(aux[selected_variables_new_std])
# Fit algorithm to the data
aux["hc_ag4_wKNN28"] = model_knn28.labels_
/usr/local/lib/python3.10/dist-packages/geopandas/geodataframe.py:1538: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  super().__setitem__(key, value)
In [ ]:
# Set up figure and ax
f, ax = plt.subplots(1, figsize=(9, 9))
# Plot unique values choropleth including a legend and with no boundary lines
aux.plot(
    column="hc_ag4_wKNN28",
    categorical=True,
    legend=True,
    linewidth=0,
    ax=ax,
)
# Remove axis
ax.set_axis_off()
# Display the map
plt.show()
No description has been provided for this image

En general, vemos que la clasificación es muy similar. Evaluemos sus métricas de rendimiento usando el "Compactness" (Cuanto más cercano a 0 mejor).

In [ ]:
from esda import shape as shapestats
results={}
for cluster_type in ("hc_ag4_wKNN28", "hc_ag4"):
    # compute the region polygons using a dissolve
    # el CRS de Dinamarca es 25832 !!
    regions = aux[[cluster_type, "geometry"]].to_crs(25832).dissolve(by=cluster_type)
    # compute the actual isoperimetric quotient for these regions
    ipqs = shapestats.isoperimetric_quotient(regions)
    # cast to a dataframe
    result = {cluster_type:ipqs}
    results.update(result)
# stack the series together along columns
pd.DataFrame(results)
Out[ ]:
hc_ag4_wKNN28 hc_ag4
0 0.014979 0.014580
1 0.164054 0.164054
2 0.006440 0.006302
3 0.041339 0.041339

Ahora usaremos que tan bueno es el ajuste del modelo.

In [ ]:
from sklearn import metrics

fit_scores = []
for cluster_type in ("hc_ag4_wKNN28", "hc_ag4"):
    # compute the CH score
    ch_score = metrics.calinski_harabasz_score(
        # using scaled variables
        aux[selected_variables_new_std],
        # using these labels
        aux[cluster_type],
    )
    sil_score = metrics.silhouette_score(
        # using scaled variables
        aux[selected_variables_new_std],
        # using these labels
        aux[cluster_type],
    )
    # and append the cluster type with the CH score
    fit_scores.append((cluster_type, ch_score,sil_score))


# re-arrange the scores into a dataframe for display
pd.DataFrame(
    fit_scores, columns=["cluster type", "CH score", "SIL score"]
).set_index("cluster type")
Out[ ]:
CH score SIL score
cluster type
hc_ag4_wKNN28 101.881809 0.368112
hc_ag4 108.222180 0.398315

Con esto último, podemos ver que en última instancia, el clustering convencional sigue siendo un poco mejor que el espacial !!!

Exercise 10¶

Use your three variables to carry out regression analysis (conventional and spatial).

In [ ]:
pip install pysal
Requirement already satisfied: pysal in /usr/local/lib/python3.10/dist-packages (24.1)
Requirement already satisfied: libpysal>=4.6.2 in /usr/local/lib/python3.10/dist-packages (from pysal) (4.11.0)
Requirement already satisfied: access>=1.1.8 in /usr/local/lib/python3.10/dist-packages (from pysal) (1.1.9)
Requirement already satisfied: esda>=2.4.1 in /usr/local/lib/python3.10/dist-packages (from pysal) (2.5.1)
Requirement already satisfied: giddy>=2.3.3 in /usr/local/lib/python3.10/dist-packages (from pysal) (2.3.5)
Requirement already satisfied: inequality>=1.0.0 in /usr/local/lib/python3.10/dist-packages (from pysal) (1.0.1)
Requirement already satisfied: pointpats>=2.2.0 in /usr/local/lib/python3.10/dist-packages (from pysal) (2.4.0)
Requirement already satisfied: segregation>=2.3.1 in /usr/local/lib/python3.10/dist-packages (from pysal) (2.5)
Requirement already satisfied: spaghetti>=1.6.6 in /usr/local/lib/python3.10/dist-packages (from pysal) (1.7.6)
Requirement already satisfied: mgwr>=2.1.2 in /usr/local/lib/python3.10/dist-packages (from pysal) (2.2.1)
Requirement already satisfied: momepy>=0.5.3 in /usr/local/lib/python3.10/dist-packages (from pysal) (0.7.0)
Requirement already satisfied: spglm>=1.0.8 in /usr/local/lib/python3.10/dist-packages (from pysal) (1.1.0)
Requirement already satisfied: spint>=1.0.7 in /usr/local/lib/python3.10/dist-packages (from pysal) (1.0.7)
Requirement already satisfied: spreg>=1.2.4 in /usr/local/lib/python3.10/dist-packages (from pysal) (1.4.2)
Requirement already satisfied: spvcm>=0.3.0 in /usr/local/lib/python3.10/dist-packages (from pysal) (0.3.0)
Requirement already satisfied: tobler>=0.8.2 in /usr/local/lib/python3.10/dist-packages (from pysal) (0.11.2)
Requirement already satisfied: mapclassify>=2.4.3 in /usr/local/lib/python3.10/dist-packages (from pysal) (2.6.1)
Requirement already satisfied: splot>=1.1.5.post1 in /usr/local/lib/python3.10/dist-packages (from pysal) (1.1.5.post1)
Requirement already satisfied: spopt>=0.4.1 in /usr/local/lib/python3.10/dist-packages (from pysal) (0.6.1)
Requirement already satisfied: geopandas in /usr/local/lib/python3.10/dist-packages (from access>=1.1.8->pysal) (0.13.2)
Requirement already satisfied: numpy>=1.3 in /usr/local/lib/python3.10/dist-packages (from access>=1.1.8->pysal) (1.25.2)
Requirement already satisfied: pandas>=0.23.4 in /usr/local/lib/python3.10/dist-packages (from access>=1.1.8->pysal) (2.0.3)
Requirement already satisfied: requests>=2 in /usr/local/lib/python3.10/dist-packages (from access>=1.1.8->pysal) (2.31.0)
Requirement already satisfied: scikit-learn>=1.0 in /usr/local/lib/python3.10/dist-packages (from esda>=2.4.1->pysal) (1.2.2)
Requirement already satisfied: scipy>=1.9 in /usr/local/lib/python3.10/dist-packages (from esda>=2.4.1->pysal) (1.11.4)
Requirement already satisfied: quantecon>=0.4.7 in /usr/local/lib/python3.10/dist-packages (from giddy>=2.3.3->pysal) (0.7.2)
Requirement already satisfied: beautifulsoup4>=4.10 in /usr/local/lib/python3.10/dist-packages (from libpysal>=4.6.2->pysal) (4.12.3)
Requirement already satisfied: packaging>=22 in /usr/local/lib/python3.10/dist-packages (from libpysal>=4.6.2->pysal) (24.1)
Requirement already satisfied: platformdirs>=2.0.2 in /usr/local/lib/python3.10/dist-packages (from libpysal>=4.6.2->pysal) (4.2.2)
Requirement already satisfied: shapely>=2.0.1 in /usr/local/lib/python3.10/dist-packages (from libpysal>=4.6.2->pysal) (2.0.4)
Requirement already satisfied: networkx>=2.7 in /usr/local/lib/python3.10/dist-packages (from mapclassify>=2.4.3->pysal) (3.3)
Requirement already satisfied: tqdm>=4.63.0 in /usr/local/lib/python3.10/dist-packages (from momepy>=0.5.3->pysal) (4.66.4)
Requirement already satisfied: matplotlib in /usr/local/lib/python3.10/dist-packages (from pointpats>=2.2.0->pysal) (3.7.1)
Requirement already satisfied: deprecation in /usr/local/lib/python3.10/dist-packages (from segregation>=2.3.1->pysal) (2.1.0)
Requirement already satisfied: joblib in /usr/local/lib/python3.10/dist-packages (from segregation>=2.3.1->pysal) (1.4.2)
Requirement already satisfied: seaborn in /usr/local/lib/python3.10/dist-packages (from segregation>=2.3.1->pysal) (0.13.1)
Requirement already satisfied: numba in /usr/local/lib/python3.10/dist-packages (from segregation>=2.3.1->pysal) (0.58.1)
Requirement already satisfied: pyproj>=3 in /usr/local/lib/python3.10/dist-packages (from segregation>=2.3.1->pysal) (3.6.1)
Requirement already satisfied: rtree>=1.0 in /usr/local/lib/python3.10/dist-packages (from spaghetti>=1.6.6->pysal) (1.2.0)
Requirement already satisfied: pulp>=2.7 in /usr/local/lib/python3.10/dist-packages (from spopt>=0.4.1->pysal) (2.8.0)
Requirement already satisfied: rasterio in /usr/local/lib/python3.10/dist-packages (from tobler>=0.8.2->pysal) (1.3.10)
Requirement already satisfied: statsmodels in /usr/local/lib/python3.10/dist-packages (from tobler>=0.8.2->pysal) (0.14.2)
Requirement already satisfied: rasterstats in /usr/local/lib/python3.10/dist-packages (from tobler>=0.8.2->pysal) (0.19.0)
Requirement already satisfied: soupsieve>1.2 in /usr/local/lib/python3.10/dist-packages (from beautifulsoup4>=4.10->libpysal>=4.6.2->pysal) (2.5)
Requirement already satisfied: fiona>=1.8.19 in /usr/local/lib/python3.10/dist-packages (from geopandas->access>=1.1.8->pysal) (1.9.6)
Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.10/dist-packages (from pandas>=0.23.4->access>=1.1.8->pysal) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas>=0.23.4->access>=1.1.8->pysal) (2023.4)
Requirement already satisfied: tzdata>=2022.1 in /usr/local/lib/python3.10/dist-packages (from pandas>=0.23.4->access>=1.1.8->pysal) (2024.1)
Requirement already satisfied: certifi in /usr/local/lib/python3.10/dist-packages (from pyproj>=3->segregation>=2.3.1->pysal) (2024.6.2)
Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from quantecon>=0.4.7->giddy>=2.3.3->pysal) (1.12.1)
Requirement already satisfied: llvmlite<0.42,>=0.41.0dev0 in /usr/local/lib/python3.10/dist-packages (from numba->segregation>=2.3.1->pysal) (0.41.1)
Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests>=2->access>=1.1.8->pysal) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests>=2->access>=1.1.8->pysal) (3.7)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests>=2->access>=1.1.8->pysal) (2.0.7)
Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from scikit-learn>=1.0->esda>=2.4.1->pysal) (3.5.0)
Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->pointpats>=2.2.0->pysal) (1.2.1)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.10/dist-packages (from matplotlib->pointpats>=2.2.0->pysal) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib->pointpats>=2.2.0->pysal) (4.53.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->pointpats>=2.2.0->pysal) (1.4.5)
Requirement already satisfied: pillow>=6.2.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib->pointpats>=2.2.0->pysal) (9.4.0)
Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->pointpats>=2.2.0->pysal) (3.1.2)
Requirement already satisfied: affine in /usr/local/lib/python3.10/dist-packages (from rasterio->tobler>=0.8.2->pysal) (2.4.0)
Requirement already satisfied: attrs in /usr/local/lib/python3.10/dist-packages (from rasterio->tobler>=0.8.2->pysal) (23.2.0)
Requirement already satisfied: click>=4.0 in /usr/local/lib/python3.10/dist-packages (from rasterio->tobler>=0.8.2->pysal) (8.1.7)
Requirement already satisfied: cligj>=0.5 in /usr/local/lib/python3.10/dist-packages (from rasterio->tobler>=0.8.2->pysal) (0.7.2)
Requirement already satisfied: snuggs>=1.4.1 in /usr/local/lib/python3.10/dist-packages (from rasterio->tobler>=0.8.2->pysal) (1.4.7)
Requirement already satisfied: click-plugins in /usr/local/lib/python3.10/dist-packages (from rasterio->tobler>=0.8.2->pysal) (1.1.1)
Requirement already satisfied: setuptools in /usr/local/lib/python3.10/dist-packages (from rasterio->tobler>=0.8.2->pysal) (67.7.2)
Requirement already satisfied: simplejson in /usr/local/lib/python3.10/dist-packages (from rasterstats->tobler>=0.8.2->pysal) (3.19.2)
Requirement already satisfied: patsy>=0.5.6 in /usr/local/lib/python3.10/dist-packages (from statsmodels->tobler>=0.8.2->pysal) (0.5.6)
Requirement already satisfied: six in /usr/local/lib/python3.10/dist-packages (from fiona>=1.8.19->geopandas->access>=1.1.8->pysal) (1.16.0)
Requirement already satisfied: mpmath<1.4.0,>=1.1.0 in /usr/local/lib/python3.10/dist-packages (from sympy->quantecon>=0.4.7->giddy>=2.3.3->pysal) (1.3.0)

Coventional regression

In [ ]:
from pysal.model import spreg

dep_var_name=['crimenes_reportados_2023Q3']
ind_vars_names=['Poblacion','Life_excpectancy_2023']


ols_model = spreg.OLS(
    # Dependent variable
    aux[dep_var_name].values,
    # Independent variables
    aux[ind_vars_names].values,
    w=w_knn8_aux,
    spat_diag = True,
    moran=True,
    # Dependent variable name
    name_y=dep_var_name[0],
    # Independent variable name
    name_x=ind_vars_names)

print(ols_model.summary)
REGRESSION RESULTS
------------------

SUMMARY OF OUTPUT: ORDINARY LEAST SQUARES
-----------------------------------------
Data set            :     unknown
Weights matrix      :     unknown
Dependent Variable  :crimenes_reportados_2023Q3                Number of Observations:          91
Mean dependent var  :   1082.3626                Number of Variables   :           3
S.D. dependent var  :   2401.7287                Degrees of Freedom    :          88
R-squared           :      0.9124
Adjusted R-squared  :      0.9104
Sum squared residual: 4.54858e+07                F-statistic           :    458.1886
Sigma-square        :  516884.535                Prob(F-statistic)     :   2.977e-47
S.E. of regression  :     718.947                Log likelihood        :    -726.177
Sigma-square ML     :  499844.386                Akaike info criterion :    1458.354
S.E of regression ML:    706.9967                Schwarz criterion     :    1465.886

------------------------------------------------------------------------------------
            Variable     Coefficient       Std.Error     t-Statistic     Probability
------------------------------------------------------------------------------------
            CONSTANT      5695.08671      6265.83850         0.90891         0.36588
           Poblacion         0.03533         0.00117        30.14411         0.00000
Life_excpectancy_2023       -81.90710        76.89861        -1.06513         0.28973
------------------------------------------------------------------------------------

REGRESSION DIAGNOSTICS
MULTICOLLINEARITY CONDITION NUMBER          188.474

TEST ON NORMALITY OF ERRORS
TEST                             DF        VALUE           PROB
Jarque-Bera                       2         308.367           0.0000

DIAGNOSTICS FOR HETEROSKEDASTICITY
RANDOM COEFFICIENTS
TEST                             DF        VALUE           PROB
Breusch-Pagan test                2         457.878           0.0000
Koenker-Bassett test              2          83.781           0.0000

DIAGNOSTICS FOR SPATIAL DEPENDENCE
TEST                           MI/DF       VALUE           PROB
Moran's I (error)              0.1757         9.112           0.0000
Lagrange Multiplier (lag)         1           5.415           0.0200
Robust LM (lag)                   1           1.601           0.2058
Lagrange Multiplier (error)       1          44.200           0.0000
Robust LM (error)                 1          40.386           0.0000
Lagrange Multiplier (SARMA)       2          45.801           0.0000

================================ END OF REPORT =====================================

Spatial Lag Regression

In [ ]:
morancrim = Moran(aux[dep_var_name], w_knn8_aux)
morancrim.I,morancrim.p_sim
Out[ ]:
(-0.01927877374249572, 0.253)
In [ ]:
fig, ax = moran_scatterplot(morancrim, aspect_equal=True)

ax.set_xlabel('crimenes_reportados_2023Q3')
ax.set_ylabel('SpatialLag_crimenes_reportados_2023Q3');
ax.set_xlim(-1, 3)
ax.set_ylim(-0.25, 0.25)
Out[ ]:
(-0.25, 0.25)
No description has been provided for this image
In [ ]:
lag_model = spreg.ML_Lag(
    # Dependent variable
    aux[dep_var_name].values,
    # Independent variables
    aux[ind_vars_names].values,
    w=w_knn8_aux,
    # Dependent variable name
    name_y=dep_var_name[0],
    # Independent variable name
    name_x=ind_vars_names
    )

print(lag_model.summary)
REGRESSION RESULTS
------------------

SUMMARY OF OUTPUT: MAXIMUM LIKELIHOOD SPATIAL LAG (METHOD = FULL)
-----------------------------------------------------------------
Data set            :     unknown
Weights matrix      :     unknown
Dependent Variable  :crimenes_reportados_2023Q3                Number of Observations:          91
Mean dependent var  :   1082.3626                Number of Variables   :           4
S.D. dependent var  :   2401.7287                Degrees of Freedom    :          87
Pseudo R-squared    :      0.9189
Spatial Pseudo R-squared:  0.8923
Log likelihood      :   -723.1477
Sigma-square ML     : 462627.3301                Akaike info criterion :    1454.295
S.E of regression   :    680.1671                Schwarz criterion     :    1464.339

------------------------------------------------------------------------------------
            Variable     Coefficient       Std.Error     z-Statistic     Probability
------------------------------------------------------------------------------------
            CONSTANT      9037.23668      6058.71067         1.49161         0.13580
           Poblacion         0.03586         0.00111        32.25801         0.00000
Life_excpectancy_2023      -130.63976        74.78643        -1.74684         0.08067
W_crimenes_reportados_2023Q3         0.51641         0.13526         3.81796         0.00013
------------------------------------------------------------------------------------
================================ END OF REPORT =====================================

Spatial Error Regression

In [ ]:
moranError = Moran(ols_model.u, w_knn8_aux)
moranError.I,moranError.p_sim
Out[ ]:
(0.17569287923669474, 0.001)
In [ ]:
fig, ax = moran_scatterplot(moranError, aspect_equal=True)
ax.set_xlabel('OlsError')
ax.set_ylabel('SpatialOlsError');
ax.set_xlim(-4, 4)
ax.set_ylim(-1.5, 1.5)
Out[ ]:
(-1.5, 1.5)
No description has been provided for this image
In [ ]:
err_model = spreg.ML_Error(
    # Dependent variable
    aux[dep_var_name].values,
    # Independent variables
    aux[ind_vars_names].values,
    w=w_knn8_aux,
    # Dependent variable name
    name_y=dep_var_name[0],
    # Independent variable name
    name_x=ind_vars_names
    )

print(err_model.summary)
REGRESSION RESULTS
------------------

SUMMARY OF OUTPUT: ML SPATIAL ERROR (METHOD = full)
---------------------------------------------------
Data set            :     unknown
Weights matrix      :     unknown
Dependent Variable  :crimenes_reportados_2023Q3                Number of Observations:          91
Mean dependent var  :   1082.3626                Number of Variables   :           3
S.D. dependent var  :   2401.7287                Degrees of Freedom    :          88
Pseudo R-squared    :      0.9123
Log likelihood      :   -720.5941
Sigma-square ML     : 434370.2042                Akaike info criterion :    1447.188
S.E of regression   :    659.0677                Schwarz criterion     :    1454.721

------------------------------------------------------------------------------------
            Variable     Coefficient       Std.Error     z-Statistic     Probability
------------------------------------------------------------------------------------
            CONSTANT      7093.32694      6062.03915         1.17012         0.24195
           Poblacion         0.03539         0.00106        33.48993         0.00000
Life_excpectancy_2023       -99.54498        74.57100        -1.33490         0.18191
              lambda         0.62306         0.18042         3.45336         0.00055
------------------------------------------------------------------------------------
================================ END OF REPORT =====================================
/usr/local/lib/python3.10/dist-packages/scipy/optimize/_minimize.py:913: RuntimeWarning: Method 'bounded' does not support relative tolerance in x; defaulting to absolute tolerance.
  warn("Method 'bounded' does not support relative tolerance in x; "

Spatial Error Regression, correcting heteroscedasticy.

In [ ]:
error_Het_model = spreg.GM_Error_Het(
    # Dependent variable
    aux[dep_var_name].values,
    # Independent variables
    aux[ind_vars_names].values,
    # Spatial weights matrix
    w=w_knn8_aux,
    # Dependent variable name
    name_y=dep_var_name[0],
    # Independent variable name
    name_x=ind_vars_names,
)
print(error_Het_model.summary)
REGRESSION RESULTS
------------------

SUMMARY OF OUTPUT: GM SPATIALLY WEIGHTED LEAST SQUARES (HET)
------------------------------------------------------------
Data set            :     unknown
Weights matrix      :     unknown
Dependent Variable  :crimenes_reportados_2023Q3                Number of Observations:          91
Mean dependent var  :   1082.3626                Number of Variables   :           3
S.D. dependent var  :   2401.7287                Degrees of Freedom    :          88
Pseudo R-squared    :      0.9122
N. of iterations    :           1                Step1c computed       :          No

------------------------------------------------------------------------------------
            Variable     Coefficient       Std.Error     z-Statistic     Probability
------------------------------------------------------------------------------------
            CONSTANT    -65505157462222264.00000    228966392503277184.00000        -0.28609         0.77481
           Poblacion         0.03530         0.00454         7.78100         0.00000
Life_excpectancy_2023      -106.62558        64.71054        -1.64773         0.09941
              lambda         1.00000         0.00000    1465725428612823.75000         0.00000
------------------------------------------------------------------------------------
================================ END OF REPORT =====================================